Project: Gapminder Dataset Analysis¶

Table of Contents¶

  • Introduction
  • Data Wrangling
  • Exploratory Data Analysis
  • Conclusions

Introduction¶

3 datatsets were collected from Gapminder to serve as samples for this analysis population_total.csv, gnicap_atm_con.csv, life_expectancy_years.csv__

The first question we'll be exploring is: Is life expectancy affected by the population size?

The second question we'll be exploring is: Is there a correlation between gdp and life expectancy?

For the purpose of ease through out the analysis, I will label gnicap_atm_con as df_inc.

In [1]:
#Importing library packages to be used throughout project

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

%matplotlib inline

Data Wrangling¶

Tip: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

General Properties¶

In [2]:
# Load data
df_pop = pd.read_csv('population_total.csv')
df_lyf = pd.read_csv('life_expectancy_years.csv')
df_inc = pd.read_csv('gnicap_atm_con.csv')

Exploring the 3 different datasets¶

In [3]:
#confirming right data loaded(population)
print(df_pop.shape)
df_pop.head()
(197, 302)
Out[3]:
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100
0 Afghanistan 3.28M 3.28M 3.28M 3.28M 3.28M 3.28M 3.28M 3.28M 3.28M ... 76.6M 76.4M 76.3M 76.1M 76M 75.8M 75.6M 75.4M 75.2M 74.9M
1 Angola 1.57M 1.57M 1.57M 1.57M 1.57M 1.57M 1.57M 1.57M 1.57M ... 168M 170M 172M 175M 177M 179M 182M 184M 186M 188M
2 Albania 400k 402k 404k 405k 407k 409k 411k 413k 414k ... 1.33M 1.3M 1.27M 1.25M 1.22M 1.19M 1.17M 1.14M 1.11M 1.09M
3 Andorra 2650 2650 2650 2650 2650 2650 2650 2650 2650 ... 63k 62.9k 62.9k 62.8k 62.7k 62.7k 62.6k 62.5k 62.5k 62.4k
4 United Arab Emirates 40.2k 40.2k 40.2k 40.2k 40.2k 40.2k 40.2k 40.2k 40.2k ... 12.3M 12.4M 12.5M 12.5M 12.6M 12.7M 12.7M 12.8M 12.8M 12.9M

5 rows × 302 columns

In [4]:
#confirming right data loaded(life expectancy)
print(df_lyf.shape)
df_lyf.head()
(195, 302)
Out[4]:
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100
0 Afghanistan 28.2 28.2 28.2 28.2 28.2 28.2 28.1 28.1 28.1 ... 75.5 75.7 75.8 76.0 76.1 76.2 76.4 76.5 76.6 76.8
1 Angola 27.0 27.0 27.0 27.0 27.0 27.0 27.0 27.0 27.0 ... 78.8 79.0 79.1 79.2 79.3 79.5 79.6 79.7 79.9 80.0
2 Albania 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 ... 87.4 87.5 87.6 87.7 87.8 87.9 88.0 88.2 88.3 88.4
3 Andorra NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 United Arab Emirates 30.7 30.7 30.7 30.7 30.7 30.7 30.7 30.7 30.7 ... 82.4 82.5 82.6 82.7 82.8 82.9 83.0 83.1 83.2 83.3

5 rows × 302 columns

I noticed a lot of null values on the life expectancy dataset

In [5]:
#confirming right data loaded(GNIperCap)
print(df_inc.shape)
df_inc.head()
(191, 252)
Out[5]:
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050
0 Afghanistan 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 ... 751 767 783 800 817 834 852 870 888 907
1 Angola 517.0 519.0 522.0 524.0 525.0 528.0 531.0 533.0 536.0 ... 2770 2830 2890 2950 3010 3080 3140 3210 3280 3340
2 Albania 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 ... 9610 9820 10k 10.2k 10.5k 10.7k 10.9k 11.1k 11.4k 11.6k
3 United Arab Emirates 738.0 740.0 743.0 746.0 749.0 751.0 754.0 757.0 760.0 ... 47.9k 48.9k 50k 51k 52.1k 53.2k 54.3k 55.5k 56.7k 57.9k
4 Argentina 794.0 797.0 799.0 802.0 805.0 808.0 810.0 813.0 816.0 ... 12.8k 13.1k 13.4k 13.6k 13.9k 14.2k 14.5k 14.8k 15.2k 15.5k

5 rows × 252 columns

Exploring for any duplicates¶

In [6]:
df_pop.duplicated().sum(), df_inc.duplicated().sum(), df_lyf.duplicated().sum()
Out[6]:
(0, 0, 0)

No duplicated data was found.

Exploring for different datatypes in each dataset¶

In [7]:
df_pop.dtypes.unique()
Out[7]:
array([dtype('O')], dtype=object)
In [8]:
df_lyf.dtypes.unique()
Out[8]:
array([dtype('O'), dtype('float64')], dtype=object)
In [9]:
df_inc.dtypes.unique()
Out[9]:
array([dtype('O'), dtype('float64')], dtype=object)

We see that the 3 datasets all have similarities, all referenced by country and years. But there are a lot of null values in some of the datasets and will have to be cleaned. Dropping the missing value rows will be the best choice for me in order to minimize any errors.

Also, it will be quite complex to work with seperate datasets at once so I'll prefer to transform them into tables with 3 columns each, then merge them together, given that they have the similar references of country and years.

Data Cleaning¶

Now, I'll be taking care of the null values using the dropna, from there I will tranform each dataset into tables and finally merge all three tables together!

Cleaning up null values¶

In [10]:
df_pop.isnull().sum()
Out[10]:
country    0
1800       0
1801       0
1802       0
1803       0
          ..
2096       0
2097       0
2098       0
2099       0
2100       0
Length: 302, dtype: int64
In [11]:
#Dropping all null value rows found in the life expectancy dataset
df_lyf = df_lyf.dropna()
df_lyf
Out[11]:
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100
0 Afghanistan 28.2 28.2 28.2 28.2 28.2 28.2 28.1 28.1 28.1 ... 75.5 75.7 75.8 76.0 76.1 76.2 76.4 76.5 76.6 76.8
1 Angola 27.0 27.0 27.0 27.0 27.0 27.0 27.0 27.0 27.0 ... 78.8 79.0 79.1 79.2 79.3 79.5 79.6 79.7 79.9 80.0
2 Albania 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 ... 87.4 87.5 87.6 87.7 87.8 87.9 88.0 88.2 88.3 88.4
4 United Arab Emirates 30.7 30.7 30.7 30.7 30.7 30.7 30.7 30.7 30.7 ... 82.4 82.5 82.6 82.7 82.8 82.9 83.0 83.1 83.2 83.3
5 Argentina 33.2 33.2 33.2 33.2 33.2 33.2 33.2 33.2 33.2 ... 86.2 86.3 86.5 86.5 86.7 86.8 86.9 87.0 87.1 87.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Samoa 25.4 25.4 25.4 25.4 25.4 25.4 25.4 25.4 25.4 ... 79.8 79.9 80.0 80.1 80.3 80.4 80.5 80.6 80.7 80.8
191 Yemen 23.4 23.4 23.4 23.4 23.4 23.4 23.4 23.4 23.4 ... 76.9 77.0 77.1 77.3 77.4 77.5 77.6 77.8 77.9 78.0
192 South Africa 33.5 33.5 33.5 33.5 33.5 33.5 33.5 33.5 33.5 ... 76.4 76.5 76.7 76.8 77.0 77.1 77.3 77.4 77.5 77.7
193 Zambia 32.6 32.6 32.6 32.6 32.6 32.6 32.6 32.6 32.6 ... 75.8 76.0 76.1 76.3 76.4 76.5 76.7 76.8 77.0 77.1
194 Zimbabwe 33.7 33.7 33.7 33.7 33.7 33.7 33.7 33.7 33.7 ... 73.3 73.4 73.5 73.7 73.8 73.9 74.0 74.2 74.3 74.4

186 rows × 302 columns

In [12]:
df_lyf.isnull().sum()
Out[12]:
country    0
1800       0
1801       0
1802       0
1803       0
          ..
2096       0
2097       0
2098       0
2099       0
2100       0
Length: 302, dtype: int64
In [13]:
#Making sure there are no null values in the df_inc rows
df_inc = df_inc.dropna()
df_inc
Out[13]:
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050
0 Afghanistan 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 ... 751 767 783 800 817 834 852 870 888 907
1 Angola 517.0 519.0 522.0 524.0 525.0 528.0 531.0 533.0 536.0 ... 2770 2830 2890 2950 3010 3080 3140 3210 3280 3340
2 Albania 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 207.0 ... 9610 9820 10k 10.2k 10.5k 10.7k 10.9k 11.1k 11.4k 11.6k
3 United Arab Emirates 738.0 740.0 743.0 746.0 749.0 751.0 754.0 757.0 760.0 ... 47.9k 48.9k 50k 51k 52.1k 53.2k 54.3k 55.5k 56.7k 57.9k
4 Argentina 794.0 797.0 799.0 802.0 805.0 808.0 810.0 813.0 816.0 ... 12.8k 13.1k 13.4k 13.6k 13.9k 14.2k 14.5k 14.8k 15.2k 15.5k
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
186 Samoa 373.0 373.0 373.0 373.0 373.0 373.0 373.0 374.0 374.0 ... 5330 5440 5560 5670 5790 5920 6040 6170 6300 6440
187 Yemen 197.0 198.0 198.0 199.0 199.0 200.0 200.0 201.0 202.0 ... 1440 1470 1500 1530 1560 1590 1630 1660 1700 1730
188 South Africa 800.0 791.0 782.0 773.0 765.0 724.0 724.0 786.0 687.0 ... 7630 7790 7960 8130 8300 8480 8660 8840 9030 9220
189 Zambia 213.0 214.0 215.0 215.0 215.0 216.0 216.0 217.0 217.0 ... 1260 1290 1320 1340 1370 1400 1430 1460 1490 1520
190 Zimbabwe 443.0 444.0 444.0 445.0 445.0 446.0 446.0 446.0 447.0 ... 1560 1590 1620 1660 1690 1730 1770 1800 1840 1880

190 rows × 252 columns

In [14]:
df_inc.isnull().sum()
Out[14]:
country    0
1800       0
1801       0
1802       0
1803       0
          ..
2046       0
2047       0
2048       0
2049       0
2050       0
Length: 252, dtype: int64

Transforming Datasets into Tables of 3 columns respectively¶

In [15]:
df_pop = df_pop.melt(id_vars=["country"], 
        var_name="year", 
        value_name="pop")
df_pop
Out[15]:
country year pop
0 Afghanistan 1800 3.28M
1 Angola 1800 1.57M
2 Albania 1800 400k
3 Andorra 1800 2650
4 United Arab Emirates 1800 40.2k
... ... ... ...
59292 Samoa 2100 310k
59293 Yemen 2100 53.2M
59294 South Africa 2100 79.2M
59295 Zambia 2100 81.5M
59296 Zimbabwe 2100 31M

59297 rows × 3 columns

In [16]:
df_inc = df_inc.melt(id_vars=["country"], 
        var_name="year", 
        value_name="income")
df_inc
Out[16]:
country year income
0 Afghanistan 1800 207.0
1 Angola 1800 517.0
2 Albania 1800 207.0
3 United Arab Emirates 1800 738.0
4 Argentina 1800 794.0
... ... ... ...
47685 Samoa 2050 6440
47686 Yemen 2050 1730
47687 South Africa 2050 9220
47688 Zambia 2050 1520
47689 Zimbabwe 2050 1880

47690 rows × 3 columns

In [17]:
df_lyf = df_lyf.melt(id_vars=["country"], 
        var_name="year", 
        value_name="life_exp")
df_lyf
Out[17]:
country year life_exp
0 Afghanistan 1800 28.2
1 Angola 1800 27.0
2 Albania 1800 35.4
3 United Arab Emirates 1800 30.7
4 Argentina 1800 33.2
... ... ... ...
55981 Samoa 2100 80.8
55982 Yemen 2100 78.0
55983 South Africa 2100 77.7
55984 Zambia 2100 77.1
55985 Zimbabwe 2100 74.4

55986 rows × 3 columns

Merging the 3 datasets into one (df1)¶

In [18]:
df1 = df_pop.merge(df_inc,on=['country','year']).merge(df_lyf,on=['country','year'])
print(df1)
                    country  year    pop income  life_exp
0               Afghanistan  1800  3.28M  207.0      28.2
1                    Angola  1800  1.57M  517.0      27.0
2                   Albania  1800   400k  207.0      35.4
3      United Arab Emirates  1800  40.2k  738.0      30.7
4                 Argentina  1800   534k  794.0      33.2
...                     ...   ...    ...    ...       ...
46179                 Samoa  2050   267k   6440      74.3
46180                 Yemen  2050  48.1M   1730      72.2
46181          South Africa  2050  75.5M   9220      70.9
46182                Zambia  2050  39.1M   1520      69.8
46183              Zimbabwe  2050  23.9M   1880      67.6

[46184 rows x 5 columns]
In [19]:
df1['year']=df1['year'].astype(int)
df1['life_exp']=df1['life_exp'].astype(float)
In [20]:
df1['income'] = df1['income'].replace({'k': '*1e3', 'm': '*1e6'}, regex=True).map(pd.eval).astype(int)
df1['pop'] = df1['pop'].replace({'k': '*1e3', 'M': '*1e6', 'B': '*1e9'}, regex=True).map(pd.eval).astype(int)

Exploring dataset for datatypes, duplicates, errors and null values¶

In [21]:
df1.dtypes
Out[21]:
country      object
year          int32
pop           int32
income        int32
life_exp    float64
dtype: object
In [22]:
df1.duplicated().sum()
Out[22]:
0
In [23]:
df1.isnull().sum()
Out[23]:
country     0
year        0
pop         0
income      0
life_exp    0
dtype: int64
In [24]:
#trim data, working with data from year:1980-2020
df1=df1.loc[33153:40513]
df1.head()
Out[24]:
country year pop income life_exp
33153 Cameroon 1980 8620000 2260 55.4
33154 Congo, Dem. Rep. 1980 26400000 533 52.1
33155 Congo, Rep. 1980 1780000 1270 52.8
33156 Colombia 1980 26900000 2170 68.9
33157 Comoros 1980 308000 1340 54.5

Exploratory Data Analysis¶

Now that the data is cleaned, trimmed and set, we can now move to the analysis. Let's get it!!

Research Question 1 : Is there any correlation between life expectancy and standard of living through the years?¶

The idea is to find out if the life expectancy of less developed countries differ significantly from that of the developed countries, we will also to verifying is their populations have any impact on life expectancy

Case Study: Cameroon, France¶

H0: Life expectancy in developed countries = life expectancy in less developed countries

H1: Life expectancy in developed countries != life expectancy in less developed countries

In [25]:
#query data with country = Cameroon
df_cm = df1.query('country == "Cameroon"')
df_cm.head()
Out[25]:
country year pop income life_exp
33153 Cameroon 1980 8620000 2260 55.4
33337 Cameroon 1981 8890000 2360 55.8
33521 Cameroon 1982 9170000 2460 56.3
33705 Cameroon 1983 9460000 2590 56.8
33889 Cameroon 1984 9760000 2720 57.1
In [26]:
#Write a function to plot layouts; this is to avoid duplicates and confusion
def plotter(x):
    
    test = x.update_layout(barmode='group', xaxis_tickangle=-45, title={
        
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
    
    return test
In [27]:
#df_fr['life_exp'].hist(figsize=(15,5));
fig = px.histogram(df_cm, x='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Life Expectancy Histogram - Cameroon');

plotter(fig)

We deduce from the above histogram that life expectancy in Cameroon from 1980-2020 ranges from 54-63 Years with highest between ages 57-59 Years, skewed to the left, signifying in most cases life expectancy is between 54-59 Years

In [28]:
#query data with country = France
df_fr = df1.query('country == "France"')
df_fr.head()
Out[28]:
country year pop income life_exp
33176 France 1980 53900000 29000 74.7
33360 France 1981 54100000 29200 74.9
33544 France 1982 54400000 29800 75.1
33728 France 1983 54700000 30100 75.3
33912 France 1984 55000000 30400 75.6
In [29]:
#df_fr['life_exp'].hist(figsize=(15,5));
fig = px.histogram(df_fr, x='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Life Expectancy Histogram - France')

plotter(fig)

We deduce from the above histogram that life expectancy France from 1980-2020 is highest after 82 Years. It is normally distributed with the range of life expectancy between 75-83 Years, significantly higher than that of Cameroon(54-63 Years)

In [30]:
#plotting relationship between life expectancy and years
#fig1=px.bar(df_cm, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'})
#fig1.update_layout(barmode='group', xaxis_tickangle=-45)
fig = px.bar(df_cm, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Relationship between Life Expectancy and Years - Cameroon')

plotter(fig)
In [31]:
fig = px.scatter(df_cm, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Relationship between Life Expectancy and Years - Cameroon');px.scatter(df_fr, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Relationship between Life Expectancy and Years - France');

plotter(fig)

From the bar chart and scatter plot above, we observe a steady rise from 2002 signifying that life expectancy in Cameroon has grown from 54 to 63+ over the last 20 years

In [32]:
fig = px.bar(df_fr, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Relationship between Life Expectancy and Years - France')

plotter(fig)
In [33]:
fig = px.scatter(df_fr, x='year', y='life_exp', height=320, labels={'life_exp':'Life Expectancy'}, title='Relationship between Life Expectancy and Years - France')

plotter(fig)

From the bar chart and the scatter plot above, we can see a steady rise through out the years signifying that life expectancy in France has grown from 75 to 82+ over the last 10+ years

In [34]:
fig = px.bar(df_cm, x='year', y='pop', color='life_exp', height=320, labels={'pop':'Population Cameroon'}, title='Life Expectancy with Respect to Pop Growth per Year - Cameroon')

plotter(fig)
In [35]:
fig=px.scatter(df_cm, x='year', y='pop', color='life_exp', height=320, labels={'pop':'Population Cameroon'}, title='Life Expectancy with Respect to Pop Growth per Year - Cameroon')

plotter(fig)
In [36]:
fig=px.bar(df_fr, x='year', y='pop', color='life_exp', height=320, labels={'pop':'Population France'}, title='Life Expectancy with Respect to Pop Growth per Year - France')

plotter(fig)
In [37]:
fig=px.scatter(df_fr, x='year', y='pop', color='life_exp', height=320, labels={'pop':'Population France'}, title='Life Expectancy with Respect to Pop Growth per Year - France')

plotter(fig)

The relationship graphs above clearly show the differences and the steady rise in life expectancy in France compared to Cameroon, with France having a higher population of 64.7M people and Cameroon with just 26.5M people; by 2019, France life expectancy was already at 82+ Years, far higher than Cameroon's(62+ Years in 2020).

Research Question 2 : Is there any correlation between Life Expectancy and Income per Person?¶

In [38]:
#compute to get descriptive statistics
df_cm.describe()
Out[38]:
year pop income life_exp
count 41.000000 4.100000e+01 41.000000 41.000000
mean 2000.000000 1.625122e+07 1672.926829 57.495122
std 11.979149 5.311372e+06 476.358079 2.438642
min 1980.000000 8.620000e+06 976.000000 54.200000
25% 1990.000000 1.180000e+07 1390.000000 55.700000
50% 2000.000000 1.550000e+07 1590.000000 57.200000
75% 2010.000000 2.030000e+07 1750.000000 58.500000
max 2020.000000 2.650000e+07 2720.000000 63.500000
In [39]:
#compute to get descriptive statistics
df_fr.describe()
Out[39]:
year pop income life_exp
count 40.000000 4.000000e+01 40.000000 40.000000
mean 1999.500000 5.945750e+07 40265.000000 79.152500
std 11.690452 3.550362e+06 7163.209065 2.599999
min 1980.000000 5.390000e+07 29000.000000 74.700000
25% 1989.750000 5.662500e+07 34650.000000 77.150000
50% 1999.500000 5.885000e+07 40700.000000 79.050000
75% 2009.250000 6.260000e+07 45450.000000 81.450000
max 2019.000000 6.510000e+07 52800.000000 82.900000
In [40]:
#df_cm['income'].hist(figsize=(15,5));
fig = px.histogram(df_cm, x='income', height=320, labels={'income':'Income'}, title='Income Histogram - Cameroon');

plotter(fig)

Income histogram is skewed to the left, with highest frequency between 1500-1700USD, very few people above 1750USD

In [41]:
#Relationship between life expectancy and income with respect to the population growth in Cameroon on bar chart
fig=px.bar(df_cm, x='pop', y='income', height=320, labels={'pop':'Population'}, color='life_exp', title='Life Expectancy And Income Per Year - Cameroon')

plotter(fig)
In [42]:
#Scatter plot for relationship between life expectancy and income with respect to the population growth in Cameroon on Scatter chart
fig=px.scatter(df_cm, x='pop', y='income', height=320, labels={'pop':'Population'}, color='life_exp', title='Life Expectancy And Income Per Year - Cameroon')

plotter(fig)

Majority of the population live with GNI below 2000USD. All those with GNI above 2000USD do not reach 59 Years

In [43]:
#df_fr['income'].hist(figsize=(15,5));
fig = px.histogram(df_fr, x='income', height=320, labels={'income':'Income'}, title='Income Histogram - France');

plotter(fig)
In [44]:
#Relationship between life expectancy and income with respect to the population growth in France on bar chart
fig= px.bar(df_fr, x='pop', y='income', height=320, labels={'pop':'Population'}, color='life_exp', title='Life Expectancy And Income Per Year - France')

plotter(fig)
In [45]:
#Scatter plot for relationship between life expectancy and income with respect to the population growth in France on scatter chart
fig= px.scatter(df_fr, x='pop', y='income', height=320, labels={'pop':'Population'}, color='life_exp', title='Life Expectancy And Income Per Year - France')

plotter(fig)

Comparing the plots for Cameroon and France, we see some differences, firstly, life expectancy decreases with increase in GNI with respect to the population of Cameroon. In contrast, life expectancy increases with respect to the french population.

Conclusions¶

Results:¶

From the analysis carried out;

We found using the case studies that France have have a higher life expectancy range(82+ Years) than Cameroon(64+ Years).

We see that life expectancy had a steady rise for 20+ years in both Cameroon and France, which could mean that the increase in population has no negative effect on the life expectancy.

The mean GNI of Cameroon=1672.9 while the mean GNI of France=40265,this also points to a higher standard of living in the developed countries.

We see in the analysis that Life expectancy is lower for GNI>2000USD and higher for GNI<2000USD, but with France, life expectancy increases with increase in GNI.

In conclusion, population size does has an effect on life expectancy; as we can see on the bar and scatter plots, life expectancy gets higher as the population size increases. Why?

Finally, income does have an effect on life expectancy but this effect depends on the populationin question, it could either be negative(as seen in the case of Cameroon where an increase in GNI instead leads to a lower life expectancy) or positive(as in the case of France where an Increase in GNI leads to a higher life expectancy).

Limitations:¶

There are a few limitations with our data:

The statistics is focused more on the descriptive and a little hypothesis testing, so we didn't involve ourselves with inferentials or causatives.

We work with a limited amount of data, due to the presence of a good number of null values that didn't permit us to increase our scope of analysis.